Loadstar 242

home *** CD-ROM | disk | FTP | other *** search

/ Loadstar 242 / 242.d81 / t.pack about < prev next >

Wrap

Text File | 2022-08-26 | 13KB | 478 lines

u S T A R P A C K E R 1 . 2 by Lee Novak [ALL SECRETS REVEALED] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] This article is for anyone who is interested in how this packer works. While the actual code was very difficult to write, the ideas behind it are rather simple. You might think: "How can a file be compressed? How can you construct a file without having all the data?" It doesn't seem possible. The idea I got from LS #135's article on LZS COMPRESSION was that any duplicated data can be packed. What's needed is a way to signal when something is to repeated. You also need to tell the unpacker where the original data is, and how much of it you need to repeat. How you go about this doesn't matter, just as long as it works! The LZS article talks of an unpacker that reads from a stream of bits only. This means that specific information (where repeat data is, etc.) might be split up between two or more bytes. I'm guessing that this method of varying bit-lengths provides the best compression, and maybe that's why the BIT IMPLODER is so good. But it sure sounded difficult. I didn't consider using this method for long. [THE THEORY OF RARES] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] I knew this packer was going to use simpler byte-sized pieces of data. Every byte the unpacker would read in could have any value 0-255. At least one value must be reserved to tell the unpacker that compression information is to follow. What value to use? Zero? No way! There are lots of zeroes in almost every file! What about a number that is hardly ever used? Better. But some files would use this number more than others. What was needed was a custom value for each file. How about counting the number of occurrences for each byte? Aha! The least common byte, the RARE byte, is to signal when compression is used. The unpacker would read in the byte following the RARE and know it has special meaning. But there are so many possible ways that information might be packed. How would you tell the unpacker to duplicate five bytes (matches) located 117 bytes back? If two bytes followed the RARE, you COULD represent this information. But then, how would you tell the unpacker to copy 37 bytes located 9155 bytes back? We surely don't want to pass up the savings there! Hmm, maybe we need three bytes to follow the RARE. One byte would represent the number of matches, and the next two tell how far back to get them. So now, we would need FOUR bytes to signify even the shortest packing: one RARE then three bytes. So, what advantage is there in "compressing" a string of four matches? None at all. But there's bound to be lots and lots of four-byte matches in a file! Can't we use ANY of them? Hey, what if we use another RARE? The unpacker could check if a byte is either of these RARES, and follow a different unpacking formula for each. Why not three RARES? Okay! Why not four, or five, or even twenty RARES? Two problems with this. One: the unpacker code grows with each RARE you decide to add, because each RARE represents a compression all its own. And two: all normal, uncompressed bytes are just written to memory as- is. But sometimes, an uncompressed RARE is part of the file. On these occasions, TWO bytes are needed to represent that one byte. A RARE to say "Hey, here comes a rare!" and one byte to say "We want this one!". The second RARE is slightly less "rare" than the first. The third is even less so. Eventually, the bytes may become so common that it costs MORE to represent them as two bytes each than the potential savings using another type of compression may add. [WHAT EACH RARE DOES] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] I did a lot of experimentation using different numbers of RARES and ways of packing the information. The most favorable all-purpose method of packing turned out to be one that uses the top eight RARES. Pretend you're an unpacker. You read along, splatting uncompressed bytes down without thought. Then you come across one of these RARE bytes. It's time to take the next byte, or bytes, more seriously. Here's EXACTLY what to do in these situations! RARE1, BYTE1: % 0000 0000 M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 3 matches, 3-258 bytes away Every possible value for BYTE1 is useful. ONE byte is saved for each 3-byte match. There were so many of these that a second RARE was added: RARE2, BYTE1: % 0000 0000 M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 3 matches, 259-514 bytes away Gathering 3-byte matches wasn't profitable after that, so all others are ignored. RARE3, BYTE1: % 0000 0000 M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 4 matches, 4-259 bytes away Each one of these saves TWO bytes. There are still larger matches close to the target string, but they are becoming less common. Time to break down the data byte into bits: RARE4, BYTE1: P{SHIFT-*} 0=5 matches % 0000 0000 1=6 matches M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 5-132 bytes away or 6-133 bytes away RARE5, BYTE1: P{SHIFT-*} 0=7 matches % 0000 0000 1=8 matches M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} 7-134 bytes away or 8-135 bytes away These 2-byte compressions save between 3 and 6 bytes apiece. Things are thinning out even more now, but we still want anything that will make a profit. Let's add a data byte: RARE6, BYTE1, BYTE2: % 0000 0000 % 0000 0000 MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} high low {SHIFT--} nybble byte {SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 0-4095 bytes away {SHIFT--} 4-19 matches RARE7, BYTE1, BYTE2: % 0000 0000 % 0000 0000 MR{SHIFT-*}{SHIFT--} MR{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} high low {SHIFT--} nybble byte {SHIFT--} M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 4096-8191 bytes away {SHIFT--} 4-19 matches These 3-byte compressions save between 1 and 16 bytes apiece. We still need one more RARE to handle the rest of the possibilities: RARE8, BYTE1 (0): End of compression. RUN program. RARE8, BYTE1 (1-8): Add a specific uncompressed RARE. RARE8, BYTE1 (9), BYTE2: This unusual type of compression is for font data ONLY. If the exact REVERSE of the target string can be found exactly 1K earlier in the file, just like reversed font data, then we go back to copy 4-255 REVERSE bytes as specified in BYTE2. This EOR compression is probably only useful to STAR LINKED files. The cost of the related unpacking code is 34 bytes. This cost is recovered with the first 5 characters that can be packed this way! RARE8, BYTE1 (10-127), BYTE2, BYTE3: %00000000 %00000000 %00000000 {SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 5-122 high byte low byte {SHIFT--} matches M{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 0-65535 bytes away Malways 0 All remaining profitable matches are compressed this way. Since using this method costs 4 bytes each time, at least 5 characters must be packed with it to save at least one byte. RARE8, BYTE1 (128-255), BYTE2: %00000000 %00000000 {SHIFT--}M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT--} M{SHIFT-*}{SHIFT-*}R{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT--} {SHIFT--} 4-131 byte to repeat {SHIFT--} repeats {SHIFT--} Malways 1 This last one is known as RLE, or Run-Length-Encoding. It can be pretty useful for packing fonts, sprites, or anything which may have 4 or more identical bytes in a row. Now you know what does what, you get a better idea of the information on the packing screen. Here are the RARES one last time, with the categories they fall into: RARE1 - RARE5 = "short" RARE6 - RARE7 = "medium" RARE8 with 10-127 = "long" RARE8 with 128-255 = "repeat" RARE8 with 9 = "rvs font" The number of uncompressed RARES in the file isn't shown anywhere. [TOIL AND TROUBLE] [{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}] Why is this packer so slow? Actually it's not. You won't BELIEVE how much work is going on internally! The packer begins at the end of a file and works backwards. It searches for the MOST MATCHES as CLOSE TO the target string as possible. The packer begins this search for each 3-byte string a certain distance back from the string. How far back it goes depends on the CRUNCH MODE: Crunch Mode How Far Back {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} {SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*}{SHIFT-*} 1 1 page (256 bytes) 2 2 pages (512 bytes) 3 4 pages (1K)